This page last changed on Jan 18, 2009 by straha1.

QDel: Canceling a Job

Occasionally you might realize you messed up an input parameter, typed the wrong executable name or made some other mistake. Rather than letting your incorrectly-configured job run, you can cancel it using the qdel command:

qdel 3172.hpc.cl.rs.umbc.edu

where "3172.hpc.cl.rs.umbc.edu" is the job number returned from qsub. (If you forgot your job number, you can use qstat to determine what it is.) QDel can even cancel your job after it has started running. It may take a minute or two for your job to be deleted from the queue. You can use qstat to monitor the progress of the deletion.

QStat: Job Status Information

Examining Your Jobs

Your job might be sitting in the queue for a while before it runs, depending on how many people are using the cluster. You can check the status of your job using qstat:

qstat 3172.hpc.cl.rs.umbc.edu

where 3172.hpc.cl.rs.umbc.edu should be replaced by whatever job number qsub returned. If your job is in the queue or running, that command should print out a message much like this:

Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
3172.hpc            hello_parallel   straha1                0 R low_priority

where straha1 is replaced by your user name. The R indicates that your job is running. If you see a Q there, then your job is in the queue waiting to run. If qstat gives you this message:

qstat: Unknown Job Id 3172.hpc.cl.rs.umbc.edu

then your job has either aborted, completed normally or been deleted. You can get much more detailed information about your job using the -f option to qsub:

qstat -f 3172.hpc.cl.rs.umbc.edu

which will print out extensive information, including the number of nodes used, the number of processors per node, which nodes were allocated, the queue, and much more.

Examining the PBS Queue

You can see the list of all jobs in the queue by simply typing qstat (without any job number or options) which might produce something like this:

Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
3166.hpc            MPI_DG           gobbert         00:13:18 R low_priority
3167.hpc            MPI_DG           gobbert         00:41:01 R low_priority
3168.hpc            MPI_DG           gobbert         01:33:29 R low_priority
3171.hpc            llcbench         straha1         00:13:24 R low_priority
3172.hpc            hello_parallel   straha1                0 Q low_priority

You can see details about other peoples' jobs using the same qstat -f command described in the previous section. If you notice that the cluster is especially busy right now, you may wish to wait before trying to debug a new MPI program, otherwise you might be waiting an hour or more every time you start the program.

Document generated by Confluence on Mar 31, 2011 15:37